SLO-Driven Right-Sizing and Resource Provisioning of MapReduce Jobs
نویسندگان
چکیده
( LADIS'2011), held in conjunction with VLDB'2011, Seattle, Washington, Sept. 2-3, 2011. SLO-Driven Right-Sizing and Resource Provisioning of MapReduce Jobs Abhishek Verma, Ludmila Cherkasova, Roy H. Campbell HP Laboratories HPL-2011-126 MapReduce; Hadoop; performance models; completion time prediction; resource allocation There is an increasing number of MapReduce applications, e.g., personalized advertising, spam detection, real-time event log analysis, that need to be completed within a given time window. Currently, there is a lack of performance models and workload analysis tools available to system administrators for automated performance management of such MapReduce jobs. In this work, we outline a novel framework for SLO-driven resource provisioning and sizing of MapReduce jobs. First, we propose an automated profiling tool that extracts a compact job profile from the past application run(s) or by executing it on a smaller data set. Then, by applying a linear regression technique, we derive scaling factors to accurately project the application performance when processing a larger dataset. The job profile (with scaling factors) forms the basis of a MapReduce performance model that computes the lower and upper bounds on the job completion time. Finally, we provide a fast and efficient capacity planning model that for a MapReduce job with timing requirements generates a set of resource provisioning options. We validate the accuracy of our models by executing a set of realistic applications on the 66-node Hadoop cluster. External Posting Date: August 21, 2011 [Fulltext] Approved for External Publication Internal Posting Date: August 21, 2011 [Fulltext] Copyright 2011 Hewlett-Packard Development Company, L.P. SLO-Driven Right-Sizing and Resource Provisioning of MapReduce Jobs∗ Abhishek Verma University of Illinois at Urbana-Champaign Urbana, IL, US. [email protected] Ludmila Cherkasova Hewlett-Packard Labs Palo Alto, CA, US. [email protected] Roy H. Campbell University of Illinois at Urbana-Champaign Urbana, IL, US. [email protected]
منابع مشابه
Resource Provisioning Framework for MapReduce Jobs with Performance Goals
Many companies are increasingly usingMapReduce for efficient large scale data processing such as personalized advertising, spam detection, and different data mining tasks. Cloud computing offers an attractive option for businesses to rent a suitable size Hadoop cluster, consume resources as a service, and pay only for resources that were utilized. One of the open questions in such environments ...
متن کاملBig Data Using Hadoop
17ANSP-BD-001 Hadoop Performance Modeling for JobEstimation and Resource Provisioning MapReduce has become a major computing model for data intensive applications. Hadoop, an open source implementationof MapReduce, has been adopted by an increasingly growing user community. Cloud computing service providers such as AmazonEC2 Cloud offer the opportunities for Hadoop users to lease a certain amou...
متن کاملTowards Optimizing Hadoop Provisioning in the Cloud
Data analytics is becoming increasingly prominent in a variety of application areas ranging from extracting business intelligence to processing data from scientific studies. MapReduce programming paradigm lends itself well to these data-intensive analytics jobs, given its ability to scale-out and leverage several machines to parallely process data. In this work we argue that such MapReduce-base...
متن کاملDynamically Scheduling a Component-Based Framework in Clusters
In many clusters and datacenters, application frameworks are used that offer programming models such as Dryad and MapReduce, and jobs submitted to the clusters or datacenters may be targeted at specific instances of these frameworks, for example because of the presence of certain data. An important question that then arises is how to allocate resources to framework instances that may have highl...
متن کاملQoS-Based Pricing and Scheduling of Batch Jobs in OpenStack Clouds
The current Cloud infrastructure services (IaaS) market employs a resource-based selling model: customers rent nodes from the provider and pay per-node per-unit-time. This selling model places the burden upon customers to predict their job resource requirements and durations. Inaccurate prediction by customers can result in over-provisioning of resources, or under-provisioning and poor job perf...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011